A practical approach for incorporating dependence among fields in probabilistic record linkage
نویسندگان
چکیده
BACKGROUND Methods for linking real-world healthcare data often use a latent class model, where the latent, or unknown, class is the true match status of candidate record-pairs. This commonly used model assumes that agreement patterns among multiple fields within a latent class are independent. When this assumption is violated, various approaches, including the most commonly proposed loglinear models, have been suggested to account for conditional dependence. METHODS We present a step-by-step guide to identify important dependencies between fields through a correlation residual plot and demonstrate how they can be incorporated into loglinear models for record linkage. This method is applied to healthcare data from the patient registry for a large county health department. RESULTS Our method could be readily implemented using standard software (with code supplied) to produce an overall better model fit as measured by BIC and deviance. Finding the most parsimonious model is known to reduce bias in parameter estimates. CONCLUSIONS This novel approach identifies and accommodates conditional dependence in the context of record linkage. The conditional dependence model is recommended for routine use due to its flexibility for incorporating conditional dependence and easy implementation using existing software.
منابع مشابه
Probabilistic Linkage of Persian Record with Missing Data
Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...
متن کاملA Hierarchical Graphical Model for Record Linkage
The task of matching co-referent records is known among other names as record linkage. For large record-linkage problems, often there is little or no labeled data available, but unlabeled data shows a reasonably clear structure. For such problems, unsupervised or semi-supervised methods are preferable to supervised methods. In this paper, we describe a hierarchical graphical model framework for...
متن کاملPractical introduction to record linkage for injury research.
The frequency of early fatality and the transient nature of emergency medical care mean that a single database will rarely suffice for population based injury research. Linking records from multiple data sources is therefore a promising method for injury surveillance or trauma system evaluation. The purpose of this article is to review the historical development of record linkage, provide a bas...
متن کاملA Conceptual Algorithm to Link Police and Hospital Records Based on Occurrence of Values
Road safety research, in particular road and traffic safety evaluation research, is highly applied and carried out mostly to help reducing the number of road accidents and the injuries resulting from them. This subject has been continuously studied, and in developed countries road safety is improved in a way that, more and more, new measures have less visible impact. Although measures are usual...
متن کاملFACTS Devices Allocation to Congestion Alleviation Incorporating Voltage Dependence of Loads
This paper presents a novel optimization based methodology to allocate Flexible AC Transmission Systems (FACTS) devices in an attempt to improve the previously mentioned researches in this field. Static voltage stability enhancement, voltage profile improvement, line congestion alleviation, and FACTS devices investment cost reduction, have been considered, simultaneously, as objective funct...
متن کامل